Morphologically Annotated Corpora and Morphological Analyzers for Moroccan and Sanaani Yemeni Arabic
نویسندگان
چکیده
We present new language resources for Moroccan and Sanaani Yemeni Arabic. The resources include corpora for each dialect which have been morphologically annotated, and morphological analyzers for each dialect which are derived from these corpora. These are the first sets of resources for Moroccan and Yemeni Arabic. The resources will be made available to the public.
منابع مشابه
Automatic Extraction of Morphological Lexicons from Morphologically Annotated Corpora
We present a method for automatically learning inflectional classes and associated lemmas from morphologically annotated corpora. The method consists of a core languageindependent algorithm, which can be optimized for specific languages. The method is demonstrated on Egyptian Arabic and German, two morphologically rich languages. Our best method for Egyptian Arabic provides an error reduction o...
متن کاملF0 Alignment Patterns in Arabic Dialects
A comparison of F0 alignment values was carried out for three Arabic dialects (Moroccan Arabic, Kuwaiti Arabic and Yemeni Arabic) using five speakers from each dialect. Clear differences found in alignment enable separation of Moroccan Arabic from the two other dialects: a) values of the F0 valley differed significantly, with Moroccan Arabic showing a later synchronisation than Kuwaiti Arabic a...
متن کاملRapid Development of Morphological Analyzers for Typologically Diverse Languages
The Low Resource Language research conducted under DARPA’s Broad Operational Language Translation (BOLT) program required the rapid creation of text corpora of typologically diverse languages (Turkish, Hausa, and Uzbek) which were annotated with morphological information, along with other types of annotation. Since the output of morphological analyzers is a significant aid to morphological anno...
متن کاملA Large Scale Corpus of Gulf Arabic
Most Arabic natural language processing tools and resources are developed to serve Modern Standard Arabic (MSA), which is the official written language in the Arab World. Some Dialectal Arabic varieties, notably Egyptian Arabic, have received some attention lately and have a growing collection of resources that include annotated corpora and morphological analyzers and taggers. Gulf Arabic, howe...
متن کاملArabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents
Besides for its own merits, text classification (TC) has become a cornerstone in many applications. Work presented here is part of and a pre-requisite for a project we have overtaken to create a corpus for the Arabic text process. It is an attempt to create modules automatically that would help speed up the process of classification for any text categorization task. It also serves as a tool for...
متن کامل